18.05 Lecture 34 
May 6, 2005 



Contingency tables, test of independence. 
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Random Sample: 

Xi = (x{,xl), ...,X n = (X n ,X n ) 
Question: AreX 1 ,^ 2 independent? 

Example: When asked if your finances are better, worse, or the same as last year, 
see if the answer depends on income range: 





Worse 


Same 


Better 


< 20K 


20 


15 


12 


20K - 30K 


24 


27 


32 


> 30K 


14 


22 


23 



Check if the differences and subtle trend are significant or random. 

% = V(i,j) = P(i) x P(j) if independent, for all cells ij 

Independence hypothesis can be written as: 

H\ : Oij = p t q where pi + ... + p a = 1, q 1 + ... + q b = 1 

H 2 ■ otherwise. 

r = number of categories = ab 

s = dimension of parameter set = a + b — 2 

The MLE p* , q* needs to be found — > 



np l q- 



X 



r-a-l=o6-(o+6-2)-l=(o-l)(6-l) 



i,3 * " ^ 

Distribution has (a - l)(b - 1) degrees of freedom. 



Likelihood: 



^,?)=ri(^) jvw =n^ i+x ii^ 



Note: N l+ = ^ % and N +j = ^ N tj 
Maximize each factor to maximize the product. 
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N i+ lo SK maX : Pi + ... +Po = 1 

Use Lagrange multipliers to solve the constrained maximization: 
Ei Ni+ lo SK ~ A (Ei-P« - !) max P min A 




c 



Decision Rule: 

(5 = { i?i : T < c; H 2 : T > c} 

Choose c from the chi-square distribution, (a - l)(b - 1) d.o.f., at a level of significance a — area. 

From the above example: 
N 1+ = 47, N 2+ = 83, N 3+ = 59 
N +1 = 58, N +2 = 64, N+3 = 67 
n = 189 

For each cell, the component of the T statistic adds as follows: 



Is T too large? 

T ~ X( 3 -l)(3-l) = *4 




c=9.4SS 



For this distribution, c = 9.488 

According to the decision rule, accept Hi, because 5.210 < 9.488 
Test of Homogeniety - very similar to independence test. 
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Category 1 


Category b 


Group 1 






Group a 


N al 


N ab 



1. Sample from entire population. 

2. Sample from each group separately, independently between the groups. 

Question: P(category j | group i) = P(category j) 
This is the same as independence testing! 

P(category j, group i) = P(category j)P(group i) 



P(Gi) 



P(Gi) 



Consider a situation where group 1 is 99% of the population, and group 2 is 1%. 
You would be better off sampling separately and independently. 
Say you sample 100 of each, just need to renormalize within the population. 
The test now becomes a test of independence. 

Example: pg. 560 

100 people were asked if service by a fire station was satisfactory or not. 
Then, after a fire occured, the people were asked again. 
See if the opinion changed in the same people. 



Before Fire 


80 


20 


After Fire 


72 


28 




satisfied 


unsatisfied 



But, you can't use this if you are asking the same people! Not independent! 
Better way to arrange: 



Originally Satisfied 


70 


10 


Originally Unsatisfied 


2 


18 




After, Satisfied 


After, Not Satisfied 



If taken from the entire population, this is ok. Otherwise you are taking from a dependent population. 



End of Lecture 34 
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